74 research outputs found

    Sample Efficient Monte Carlo Tree Search for Robotics

    Get PDF
    Artificial intelligent agents that behave like humans have become a defining theme and one of the main goals driving the rapid development of deep learning, particularly reinforcement learning (RL), in recent years. Monte-Carlo Tree Search (MCTS) is a class of methods for solving complex decision-making problems through the synergy of Monte-Carlo planning and Reinforcement Learning (RL). MCTS has yielded impressive results in Go (AlphaGo), Chess(AlphaZero), or video games, and it has been further exploited successfully in motion planning, autonomous car driving, and autonomous robotic assembly tasks. Many of the MCTS successes rely on coupling MCTS with neural networks trained using RL methods such as Deep Q-Learning, to speed up the learning of large-scale problems. Despite achieving state-of-the-art performance, the highly combinatorial nature of the problems commonly addressed by MCTS requires the use of efficient exploration-exploitation strategies for navigating the planning tree and quickly convergent value backup methods. Furthermore, large-scale problems such as Go and Chess games require the need for a sample efficient method to build an effective planning tree, which is crucial in on-the-fly decision making. These acute problems are particularly evident, especially in recent advances that combine MCTS with deep neural networks for function approximation. In addition, despite the recent success of applying MCTS to solve various autonomous robotics tasks, most of the scenarios, however, are partially observable and require an advanced planning method in complex, unstructured environments. This thesis aims to tackle the following question: How can robots plan efficiency under highly stochastic dynamic and partial observability? The following paragraphs will try to answer the question: First, we propose a novel backup strategy that uses the power mean operator, which computes a value between the average and maximum value. We call our new approach Power Mean Upper Confidence bound Tree (Power-UCT). We theoretically analyze our method providing guarantees of convergence to the optimum. Finally, we empirically demonstrate the effectiveness of our method in well-known Markov decision process (MDP) and partially observable Markov decision process (POMDP) benchmarks, showing significant improvement in terms of sample efficiency and convergence speed w.r.t. state-of-the-art algorithms. Second, we investigate an efficient exploration-exploitation planning strategy by providing a comprehensive theoretical convex regularization framework in MCTS. We derive the first regret analysis of regularized MCTS, showing that it guarantees an exponential convergence rate. Subsequently, we exploit our theoretical framework to introduce novel regularized backup operators for MCTS based on the relative entropy of the policy update and, more importantly, on the Tsallis entropy of the policy, for which we prove superior theoretical guarantees. Afterward, we empirically verify the consequence of our theoretical results on a toy problem. Eventually, we show how our framework can easily be incorporated in AlphaGo, and we empirically show the superiority of convex regularization, w.r.t. representative baselines, on well-known RL problems across several Atari games. Next, we take a further step to draw the connection between the two methods, Power-UCT and the convex regularization in MCTS, providing a rigorous theoretical study on the effectiveness of α-divergence in online Monte-Carlo planning. We show how the two methods can be related by using α-divergence. We additionally provide an in-depth study on the range of α parameter that helps to trade-off between exploration-exploitation in MCTS, hence showing how α-divergence can achieve state-of-the-art results in complex tasks. Finally, we investigate a novel algorithmic formulation of the popular MCTS algorithm for robot path planning. Notably, we study Monte-Carlo Path Planning (MCPP) by analyzing and proving, on the one part, its exponential convergence rate to the optimal path in fully observable MDPs, and on the other part, its probabilistic completeness for finding feasible paths in POMDPs (proof sketch) assuming limited distance observability. Our algorithmic contribution allows us to employ recently proposed variants of MCTS with different exploration strategies for robot path planning. Our experimental evaluations in simulated 2D and 3D environments with a 7 degrees of freedom (DOF) manipulator and in a real-world robot path planning task demonstrate the superiority of MCPP in POMDP tasks. In summary, this thesis proposes and analyses novel value backup operators and policy selection strategies both in terms of theoretical and experimental perspectives to help cope with sample efficiency and exploration-exploitation trade-off problems in MCTS and bring these advanced methods to robot path planning, showing the superiority in POMDPs w.r.t the state-of-the-art methods

    Monte-Carlo tree search with uncertainty propagation via optimal transport

    Full text link
    This paper introduces a novel backup strategy for Monte-Carlo Tree Search (MCTS) designed for highly stochastic and partially observable Markov decision processes. We adopt a probabilistic approach, modeling both value and action-value nodes as Gaussian distributions. We introduce a novel backup operator that computes value nodes as the Wasserstein barycenter of their action-value children nodes; thus, propagating the uncertainty of the estimate across the tree to the root node. We study our novel backup operator when using a novel combination of L1L^1-Wasserstein barycenter with α\alpha-divergence, by drawing a notable connection to the generalized mean backup operator. We complement our probabilistic backup operator with two sampling strategies, based on optimistic selection and Thompson sampling, obtaining our Wasserstein MCTS algorithm. We provide theoretical guarantees of asymptotic convergence to the optimal policy, and an empirical evaluation on several stochastic and partially observable environments, where our approach outperforms well-known related baselines

    Explain by Evidence: An Explainable Memory-based Neural Network for Question Answering

    Full text link
    Interpretability and explainability of deep neural networks are challenging due to their scale, complexity, and the agreeable notions on which the explaining process rests. Previous work, in particular, has focused on representing internal components of neural networks through human-friendly visuals and concepts. On the other hand, in real life, when making a decision, human tends to rely on similar situations and/or associations in the past. Hence arguably, a promising approach to make the model transparent is to design it in a way such that the model explicitly connects the current sample with the seen ones, and bases its decision on these samples. Grounded on that principle, we propose in this paper an explainable, evidence-based memory network architecture, which learns to summarize the dataset and extract supporting evidences to make its decision. Our model achieves state-of-the-art performance on two popular question answering datasets (i.e. TrecQA and WikiQA). Via further analysis, we show that this model can reliably trace the errors it has made in the validation step to the training instances that might have caused these errors. We believe that this error-tracing capability provides significant benefit in improving dataset quality in many applications.Comment: Accepted to COLING 202

    Study on chemical constituents of the lichen Parmotrema sancti-angelii (Lynge) Hale. (Parmeliaceae)

    Get PDF
    Lichens are fungal and algal/cyanobacterial symbioses resulting in the production of specific metabolites. Parmotrema sancti-angelii (Lynge) Hale is a lichen which has not been chemically and biologically studied well. From the lichen collected in Vietnam, colour reactions for identification of lichen substances (+K  red, +P yellow, -C, +KC red ) suggested the presence of quinones, depsides and xanthones containing two free hydroxyl groups in meta-position, depsides and depsidones containing an aldehyde group. Chemical constituent study led to the isolation of three compounds, including methyl β-orcinolcarboxylate (1), salazinic acid (2) and atranorin (3). Their structures were confirmed unambiguously by X-ray diffraction, spectroscopic data and compared with those in references. This is the first report of salazinic acid distribution in such lichen. Keywords. Parmeliaceae, Parmotrema sancti-angelii, X-ray, NMR, salazinic acid

    A Bibliometric Analysis of the Global Research Trend in Child Maltreatment

    Get PDF
    Child maltreatment remains a major health threat globally that requires the understanding of socioeconomic and cultural contexts to craft effective interventions. However, little is known about research agendas globally and the development of knowledge-producing networks in this field of study. This study aims to explore the bibliometric overview on child maltreatment publications to understand their growth from 1916 to 2018. Data from the Web of Science Core Collection were collected in May 2018. Only research articles and reviews written in the English language were included, with no restrictions by publication date. We analyzed publication years, number of papers, journals, authors, keywords and countries, and presented the countries collaboration and co-occurrence keywords analysis. From 1916 to 2018, 47, 090 papers (53.0% in 2010–2018) were published in 9442 journals. Child Abuse & Neglect (2576 papers; 5.5%); Children and Youth Services Review (1130 papers; 2.4%) and Pediatrics (793 papers, 1.7%) published the most papers. The most common research areas were Psychology (16, 049 papers, 34.1%), Family Studies (8225 papers, 17.5%), and Social Work (7367 papers, 15.6%). Among 192 countries with research publications, the most prolific countries were the United States (26, 367 papers), England (4676 papers), Canada (3282 papers) and Australia (2664 papers). We identified 17 authors who had more than 60 scientific items. The most cited papers (with at least 600 citations) were published in 29 journals, headed by the Journal of the American Medical Association (JAMA) (7 papers) and the Lancet (5 papers). This overview of global research in child maltreatment indicated an increasing trend in this topic, with the world’s leading centers located in the Western countries led by the United States. We called for interdisciplinary research approaches to evaluating and intervening on child maltreatment, with a focus on low-middle income countries (LMICs) settings and specific contexts
    corecore